Model Selection

Multimodal reasoning

# Multimodal reasoning

GLM 4.1V 9B Thinking

GLM-4.1V-9B-Thinking is an open-source vision-language model based on the GLM-4-9B-0414 foundation model, focusing on improving the reasoning ability in complex tasks and supporting a 64k context length and 4K image resolution.

Transformers Supports Multiple Languages

Kimi VL A3B Thinking 2506

Kimi-VL-A3B-Thinking-2506 is an upgraded version of Kimi-VL-A3B-Thinking, with significant improvements in multimodal reasoning, visual perception and understanding, video scene processing, etc. It supports higher-resolution images and can achieve more intelligent thinking while consuming fewer tokens.

Magistral Small 2506 Vision

Magistral-Small-2506-Vision is an inference fine-tuned version based on Mistral Small 3.1 with GRPO training, an experimental checkpoint with visual capabilities.

Safetensors Supports Multiple Languages

Stockmark 2 VL 100B Beta

Stockmark-2-VL-100B-beta is a Japanese-specific vision-language model with 100 billion parameters, equipped with chain-of-thought (CoT) reasoning ability and can be used for document reading and comprehension.

Transformers Supports Multiple Languages

InternVL3 - 8B is an advanced multimodal large - language model with excellent multimodal perception and reasoning capabilities, capable of processing multimodal data such as images and videos.

Multimodal Alignment

Internvl3 1B GGUF

InternVL3 - 1B is an advanced multimodal large language model that excels in multimodal perception, reasoning, and other abilities. It also expands multimodal capabilities such as tool use and GUI agent.

Multimodal Fusion

Visionreasoner 7B

VisionReasoner-7B is an image-text-to-text model that adopts a decoupled architecture and consists of a reasoning model and a segmentation model. It can interpret user intentions and generate pixel-level masks.

Transformers English

Qwen3-8B is the latest large language model in the Qwen series. It has a variety of advanced features, supports multiple languages, and performs excellently in reasoning, instruction following, etc., bringing users a more intelligent and natural interaction experience.

Large Language Model

Synthia S1 27b Bnb 4bit

Synthia-S1-27b is an advanced reasoning AI model developed by Tesslate AI, focusing on logical reasoning, coding, and role-playing tasks.

Gemma 3 27b It GGUF

GGUF quantized version of Gemma 3 with 27B parameters, supporting image-text interaction tasks

Spec-Vision-V1 is a lightweight, state-of-the-art open-source multimodal model designed for deep integration of visual and textual data, supporting a 128K context length.

Transformers Other

SVECTOR-CORPORATION

Mulberry Qwen2vl 7b

The Mulberry model is a step-by-step reasoning-based model trained on the Mulberry - 260K SFT dataset generated through collective knowledge search.

Meditron 7b Llm Radiology

This is an open-source model under the Apache-2.0 license. Specific information needs to be supplemented.

Large Language Model

nitinaggarwal12

This is an open-source model based on the Apache-2.0 license. Specific functionalities should be referenced in the actual model documentation

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase